Goto

Collaborating Authors

 block-aware drop criterion



90 91 92 93 94 95 0.5 0.6 0.7 0.8 0.9 Accuracy Sparsity Block-aware Original Dense (a) WideResNet22-2 92.5 93 93.5 94 94.5 95 95.5 0.5 0.6 0.7 0.8 0.9 Sparsity Block-aware Original Dense

Neural Information Processing Systems

Shuffled-block sparse training effectively reduces the execution time of these layers at different sparsities, achieving overall 1.46x to 5.02x Figure 1 shows similar speedups for the three models on CIFAR100 dataset. Figure 14 shows the accuracy of shuffled-block dynamic sparse training with and without our block-aware drop criterion for WideResNet22-2, ResNet18, and VGG16 on CIFAR10 dataset. Figure 1 shows a similar pattern for the three models on CIFAR100 dataset.